收集和注释面向任务的对话框数据很困难,尤其是对于需要专家知识的高度特定领域。同时,非正式的沟通渠道(例如即时使者)在工作中越来越多地使用。这导致了许多与工作相关的信息,这些信息通过这些渠道传播,需要由员工进行后处理。为了减轻这个问题,我们提出了TexPrax,这是一种消息传递系统,以收集和注释与工作有关的聊天中发生的问题,原因和解决方案。 TexPrax使用聊天机器人直接吸引员工,以提供对话的轻量级注释并简化文档工作。为了遵守数据隐私和安全法规,我们使用端到端消息加密,并使用户完全控制其数据,该数据比常规注释工具具有各种优势。我们与德国工厂员工一起在用户研究中评估TexPrax,他们要求同事提供有关日常工作中出现的问题的解决方案。总体而言,我们收集201个面向任务的德语对话,其中包含1,027个句子,并带有句子级专家注释。我们的数据分析还表明,现实世界对话经常包含具有代码转换,对同一实体的缩写的实例,以及NLP系统应该能够处理的方言。
translated by 谷歌翻译
在工业环境中越来越越来越多地部署,如事物互联网(IOT)设备和网络物理系统(CPS)正在使制造域中的机器学习(ML)算法的生产使用。随着ML应用从研究超越真实工业环境中的高效,所以发生了可靠性问题。由于大多数ML型号在静态数据集上培训和评估,因此需要连续在线监测其性能来构建可靠的系统。此外,概念和传感器漂移可以随着时间的推移导致算法的准确性降低,从而损害了安全性,接受和经济学,如果未被发现,无法正确解决。在这项工作中,我们示例性地突出了在36个月的课程中记录的公开工业数据集的问题的严重性,并解释了可能的漂移来源。我们评估了制造和展示中常用的ML算法的稳健性,并且随着所有测试算法的越来越高,精度强烈地下降。我们进一步调查了如何利用不确定性估计来用于在线性能估计以及漂移检测作为朝着不断学习应用程序的第一步。结果表明,与随机森林等集合算法表现出漂移下的置信度校准的最小衰减。
translated by 谷歌翻译
Remote sensing of the Earth's surface water is critical in a wide range of environmental studies, from evaluating the societal impacts of seasonal droughts and floods to the large-scale implications of climate change. Consequently, a large literature exists on the classification of water from satellite imagery. Yet, previous methods have been limited by 1) the spatial resolution of public satellite imagery, 2) classification schemes that operate at the pixel level, and 3) the need for multiple spectral bands. We advance the state-of-the-art by 1) using commercial imagery with panchromatic and multispectral resolutions of 30 cm and 1.2 m, respectively, 2) developing multiple fully convolutional neural networks (FCN) that can learn the morphological features of water bodies in addition to their spectral properties, and 3) FCN that can classify water even from panchromatic imagery. This study focuses on rivers in the Arctic, using images from the Quickbird, WorldView, and GeoEye satellites. Because no training data are available at such high resolutions, we construct those manually. First, we use the RGB, and NIR bands of the 8-band multispectral sensors. Those trained models all achieve excellent precision and recall over 90% on validation data, aided by on-the-fly preprocessing of the training data specific to satellite imagery. In a novel approach, we then use results from the multispectral model to generate training data for FCN that only require panchromatic imagery, of which considerably more is available. Despite the smaller feature space, these models still achieve a precision and recall of over 85%. We provide our open-source codes and trained model parameters to the remote sensing community, which paves the way to a wide range of environmental hydrology applications at vastly superior accuracies and 2 orders of magnitude higher spatial resolution than previously possible.
translated by 谷歌翻译
The detection of anomalies in time series data is crucial in a wide range of applications, such as system monitoring, health care or cyber security. While the vast number of available methods makes selecting the right method for a certain application hard enough, different methods have different strengths, e.g. regarding the type of anomalies they are able to find. In this work, we compare six unsupervised anomaly detection methods with different complexities to answer the questions: Are the more complex methods usually performing better? And are there specific anomaly types that those method are tailored to? The comparison is done on the UCR anomaly archive, a recent benchmark dataset for anomaly detection. We compare the six methods by analyzing the experimental results on a dataset- and anomaly type level after tuning the necessary hyperparameter for each method. Additionally we examine the ability of individual methods to incorporate prior knowledge about the anomalies and analyse the differences of point-wise and sequence wise features. We show with broad experiments, that the classical machine learning methods show a superior performance compared to the deep learning methods across a wide range of anomaly types.
translated by 谷歌翻译
In Novel Class Discovery (NCD), the goal is to find new classes in an unlabeled set given a labeled set of known but different classes. While NCD has recently gained attention from the community, no framework has yet been proposed for heterogeneous tabular data, despite being a very common representation of data. In this paper, we propose TabularNCD, a new method for discovering novel classes in tabular data. We show a way to extract knowledge from already known classes to guide the discovery process of novel classes in the context of tabular data which contains heterogeneous variables. A part of this process is done by a new method for defining pseudo labels, and we follow recent findings in Multi-Task Learning to optimize a joint objective function. Our method demonstrates that NCD is not only applicable to images but also to heterogeneous tabular data.
translated by 谷歌翻译
Sensor visibility is crucial for safety-critical applications in automotive, robotics, smart infrastructure and others: In addition to object detection and occupancy mapping, visibility describes where a sensor can potentially measure or is blind. This knowledge can enhance functional safety and perception algorithms or optimize sensor topologies. Despite its significance, to the best of our knowledge, neither a common definition of visibility nor performance metrics exist yet. We close this gap and provide a definition of visibility, derived from a use case review. We introduce metrics and a framework to assess the performance of visibility estimators. Our metrics are verified with labeled real-world and simulation data from infrastructure radars and cameras: The framework easily identifies false visible or false invisible estimations which are safety-critical. Applying our metrics, we enhance the radar and camera visibility estimators by modeling the 3D elevation of sensor and objects. This refinement outperforms the conventional planar 2D approach in trustfulness and thus safety.
translated by 谷歌翻译
传感器仿真已成为一种有前途且强大的技术,可以找到许多现实世界机器人任务(例如本地化和姿势跟踪)的解决方案。但是,常用的模拟器具有高硬件要求,因此主要用于高端计算机。在本文中,我们提出了一种方法,可以直接在使用三角形网格作为环境图的移动机器人的嵌入式硬件上模拟范围传感器。这个名为Rmagine的库允许机器人直接通过射线缩放模拟传感器数据为任意范围传感器。由于机器人通常只有有限的计算资源,因此Rmagine的目的是灵活且轻巧,同时甚至可以很好地扩展到大型环境图。它通过将统一的API放在硬件制造商提供的特定专有库上,将统一的API放置在诸如Nvidia Jetson之类的多个平台上,例如Nvidia Jetson。这项工作旨在根据范围数据的模拟来支持机器人应用程序的未来开发,这些数据以前在移动系统上的合理时间内无法计算。
translated by 谷歌翻译
原则上,将变异自动编码器(VAE)应用于顺序数据提供了一种用于控制序列生成,操纵和结构化表示学习的方法。但是,训练序列VAE具有挑战性:自回归解码器通常可以解释数据而无需使用潜在空间,即后置倒塌。为了减轻这种情况,最新的模型通过将均匀的随机辍学量应用于解码器输入来削弱强大的解码器。从理论上讲,我们表明,这可以消除解码器输入提供的点式互信息,该信息通过利用潜在空间来补偿。然后,我们提出了一种对抗性训练策略,以实现基于信息的随机辍学。与标准文本基准数据集上的均匀辍学相比,我们的目标方法同时提高了序列建模性能和潜在空间中捕获的信息。
translated by 谷歌翻译
在存在潜在变量的情况下,从观察数据中估算因果关系的效果有时会导致虚假关系,这可能被错误地认为是因果关系。这是许多领域的重要问题,例如金融和气候科学。我们提出了序性因果效应变异自动编码器(SCEVAE),这是一种在隐藏混杂下的时间序列因果关系分析的新方法。它基于CEVAE框架和复发性神经网络。通过基于Pearl的Do-Calculus使用直接因果标准来计算因果链接的混杂变量强度。我们通过将其应用于具有线性和非线性因果链接的合成数据集,以显示SCEVAE的功效。此外,我们将方法应用于真实的气溶胶气候观察数据。我们将我们的方法与在合成数据上有或没有替代混杂因素的时间序列变形方法进行比较。我们证明我们的方法通过将两种方法与地面真理进行比较来表现更好。对于真实数据,我们使用因果链接的专家知识,并显示正确的代理变量的使用如何帮助数据重建。
translated by 谷歌翻译
一种算法描述了将问题转化为解决方案的一系列步骤。此外,当倒置序列定义明确时,我们说算法是可逆的。虽然可以用通用语言描述可逆算法,但通常无法通过此类语言对可逆性进行保证,因此确保可逆性需要额外(通常是非平凡)的证据。另一方面,尽管可逆编程语言可以通过将允许的操作限制为本地可逆的操作来确保其程序可逆,但以可逆风格编写程序可能会很麻烦,即使实现的算法是,也可能与常规实现有很大差异,实际上,可逆。在本文中,我们介绍了一种功能性编程语言Jeopardy,可以保证程序的可逆性而不会施加本地可逆性。特别是,危险允许有限使用不可避免的 - 甚至不确定性! - 操作,只要它们以静态确定为可逆的方式使用。但是,保证可逆性并不明显。因此,我们概述了可以提供部分静态保证的三种方法。
translated by 谷歌翻译